A Large-Scale Evaluation of Pre-editing Strategies for Improving User-Generated Content Translation

نویسندگان

  • Violeta Seretan
  • Pierrette Bouillon
  • Johanna Gerlach
چکیده

The user-generated content represents an increasing share of the information available today. To make this type of content instantly accessible in another language, the ACCEPT project focuses on developing pre-editing technologies for correcting the source text in order to increase its translatability. Linguistically-informed pre-editing rules have been developed for English and French for the two domains considered by the project, namely, the technical domain and the healthcare domain. In this paper, we present the evaluation experiments carried out to assess the impact of the proposed pre-editing rules on translation quality. Results from a large-scale evaluation campaign show that pre-editing helps indeed attain a better translation quality for a high proportion of the data, the difference with the number of cases where the adverse effect is observed being statistically significant. The ACCEPT pre-editing technology is freely available online and can be used in any Web-based environment to enhance the translatability of user-generated content so that it reaches

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ACCEPT Portal: An Online Framework for the Pre-editing and Post-editing of User-Generated Content

With the development of Web 2.0, a lot of content is nowadays generated online by users. Due to its characteristics (e.g., use of jargon and abbreviations, typos, grammatical and style errors), the user-generated content poses specific challenges to machine translation. This paper presents an online platform devoted to the pre-editing of user-generated content and its post-editing, two main typ...

متن کامل

Combining pre-editing and post-editing to improve SMT of user- generated content

The poor quality of user-generated content (UGC) found in forums hinders both readability and machine-translatability. To improve these two aspects, we have developed humanand machine-oriented pre-editing rules, which correct or reformulate this content. In this paper we present the results of a study which investigates whether pre-editing rules that improve the quality of statistical machine t...

متن کامل

The ACCEPT Academic Portal: Bringing Together Pre-editing, MT and Post-editing into a Learning Environment

Description The ACCEPT Academic Portal is a user-centred online platform specifically designed to offer a complete machine translation workflow including pre-editing and post-editing steps for teaching purposes. The platform leverages technology developed in the ACCEPT European Project (2012-2014) devoted to improving the translatability of user-generated content. Originally available as a seri...

متن کامل

Rule-based Automatic Post-processing of SMT Output to Reduce Human Post-editing Effort

To enhance sharing of knowledge across the language barrier, the ACCEPT project focuses on improving machine translation of user-generated content by investigating preand postediting strategies. Within this context, we have developed automatic monolingual post-editing rules for French, aimed at correcting frequent errors automatically. The rules were developed using the Acrolinx IQ technology, ...

متن کامل

Evaluation-guided pre-editing of source text: improving MT-tractability of light verb constructions

This paper reports an experiment on evaluating and improving MT quality of light-verb construction (LVCs) – combinations of a ‘semantically depleted’ verb and its complement. Our method uses construction-level human evaluation for systematic discovery of mistranslated contexts and creating automatic pre-editing rules, which make the constructions more tractable for Rule-Based Machine Translatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014